Machine Learning Methods: AI vs Non-AI Careers

Preparing and Preprocessing the Job Market Dataset

Authors
Affiliation

Connor Coulter

Boston University

Wei Wang

Boston University

Balqis Bevi Abdul Hannan Kanaga

Boston University

Published

October 9, 2025

Modified

October 9, 2025


title: “Introduction: Group 5” subtitle: “About The Team” —

Team: Connor Coulter, Wei Wang, Balqis Bevi Abdul Hannan Kanaga

Topic: AI vs. Non-AI Job Growth — Is AI taking over or creating more jobs?

Course: AD 688 – Cloud Analytics for Business

This site hosts our research rationale, intro, and literature review for Project Selection I, II and III.

Import Data

Loaded dataset: (72498, 131)

Data Cleaning & Preprocessing

Drop Unnecessary Columns

Derived non-null: {'INDUSTRY_DISPLAY': np.int64(72454), 'SALARY_DISPLAY': np.int64(72498)}

Drop Unnecessary Columns

Remaining columns (first 30): ['LAST_UPDATED_DATE', 'POSTED', 'EXPIRED', 'DURATION', 'SOURCE_TYPES', 'SOURCES', 'URL', 'MODELED_EXPIRED', 'MODELED_DURATION', 'COMPANY', 'COMPANY_NAME', 'COMPANY_IS_STAFFING', 'EDUCATION_LEVELS', 'EDUCATION_LEVELS_NAME', 'MIN_EDULEVELS', 'MIN_EDULEVELS_NAME', 'MAX_EDULEVELS', 'MAX_EDULEVELS_NAME', 'EMPLOYMENT_TYPE', 'EMPLOYMENT_TYPE_NAME', 'MIN_YEARS_EXPERIENCE', 'MAX_YEARS_EXPERIENCE', 'IS_INTERNSHIP', 'SALARY', 'REMOTE_TYPE', 'REMOTE_TYPE_NAME', 'ORIGINAL_PAY_PERIOD', 'SALARY_TO', 'SALARY_FROM', 'LOCATION']

Handle Missing Values

Remove Duplicates

Removed 3300 duplicates using ['TITLE', 'COMPANY_NAME', 'LOCATION', 'POSTED']

title: “Exploratory Data Analysis (EDA)” subtitle: “Visualizing and Interpreting Job Market Trends” author: - name: Connor Coulter affiliations: - id: bu name: Boston University city: Boston state: MA - name: Wei Wang affiliations: - ref: bu - name: Balqis Bevi Abdul Hannan Kanaga affiliations: - ref: bu bibliography: references.bib csl: csl/econometrica.csl format: html: embed-resources: true toc: true number-sections: true df-print: paged math: false docx: default execute: echo: false eval: true freeze: auto jupyter: python3 —

Job Postings by Industry (Top 15)

Rationale

Highlights sectors where demand is concentrated, showing which industries are actively hiring.

Key Insights

  • Top Hiring Industries: Custom Computer Programming, Management Consulting, and Employment Agencies dominate job postings.
  • Skewed Distribution: The top 4 industries account for a significantly larger share of job postings than the rest.
  • Professional Services Focus: Many high-posting sectors are centered around tech, consulting, healthcare and education – reflecting demand for knowledge-based roles.

Salary Distribution by Industry (Top 15)

Rationale

Shows where negotiation power exists and highlights industries paying well.

Key Insights

  • Wide Salary Ranges in Staffing & Tech Services: Industries like Temporary Help Services and Employment Placement Agencies exhibit large salary spreads with high outliers, though their median pay remains modest.
  • Stable Pay in Professional Sectors: Most industries maintain a consistent median salary around $100K-$150K, reflecting standardized compensation and less variation in negotiation power.

Remote vs. On-Site Jobs

Rationale

Workplace flexibility is a major factor in today’s job market.

Key Insights

  • Limited Remote Availability: Only about 17% of job postings are labeled as Remote, with Hybrid Remote and Not Remote making up even smaller portions.
  • Data Gaps in Job Listings: A significant 78.3% of postings lack remote classification, indicating either incomplete employer data or inconsistent labeling, which may affect job seekers’ filtering and selection.

Group 5 skill level

Compare our group’s skills against job market demand

Python SQL Machine Learning Cloud Computing Docker AWS
Name
Connor 2 2 2 2 0 0
Wei 1 2 1 1 0 0
Balqis 3 4 2 2 0 0

Improvement Plan

  • Balqis: Her Machine Learning and Cloud Computing are at a basic level, leaving room to grow. With a career in data analysis and visualization, Machine Learning isn’t her top priority, but Cloud Computing is worth developing further. Strengthening Python would also be valuable, as it’s essential for data analysts. A good approach is to sharpen her skills through small personal projects and apply what she learns at work. If her fundamentals feel solid, she can move towards certifications.

  • Wei: Her Python and Machine Learning are at a basic level, so she has the option to develop them further depending on how relevant they are to her career path. Since her SQL is already stronger, focusing on Python would be the most practical next step if she chooses to continue building technical skills. A good approach is to take it gradually through small projects and applied practice, and then expand into more advanced areas only if it fits her goals.

  • Connor: His skills are fairly even across all areas, at a basic stage, which gives him room to build depth. Bumping Python up to a stronger level would give him the most flexibility, while also continuing to grow in Cloud Computing to keep pace with current tools and workflows. A steady way forward is to practice Python through hands-on work and then bring in cloud tools as he becomes more confident.

Introduction

This project explores AI vs Non-AI careers using the lightcast_job_postings.csv dataset.
We apply clustering, regression, and classification to evaluate trends in job markets, with a focus on salary, experience, and employability.
The goal is to help job seekers understand how AI is shaping opportunities in 2024.

Analysis

Load dataset

Original dataset: 72498 rows
After removing missing salary and years_experience: 23697 rows

Classification: AI vs Non-AI Jobs

Clustering: Job Segmentation

Visualizations

Insights for Job Seekers

  • AI roles often cluster at higher salaries compared to non-AI roles.
  • Experience remains critical — higher years of experience align with higher pay clusters.
  • Industries with strong AI adoption (e.g., tech, finance) show clearer salary advantages.

Takeaways:

  • Highlight AI-related skills to access higher-paying roles.
  • Leverage industry trends to target fields with high AI adoption.
  • Use clustering insights to understand where your profile fits (AI-heavy vs. traditional roles).